Skip to content

perf(player): coalesce _mirrorParentMediaTime writes#396

Merged
vanceingalls merged 2 commits intomainfrom
perf/p1-4-coalesce-mirror-parent-media-time
Apr 23, 2026
Merged

perf(player): coalesce _mirrorParentMediaTime writes#396
vanceingalls merged 2 commits intomainfrom
perf/p1-4-coalesce-mirror-parent-media-time

Conversation

@vanceingalls
Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls commented Apr 21, 2026

Summary

Coalesce writes to el.currentTime inside _mirrorParentMediaTime so a single jitter sample no longer triggers a parent-media seek. A drift correction now requires two consecutive samples above the threshold (~MIRROR_DRIFT_THRESHOLD_SECONDS) before the player writes back. One-shot alignment paths (promoteToParentProxy, _onIframeMediaAdded) opt out via force: true so initial alignment stays immediate.

Why

Step P1-4 of the player perf proposal. _mirrorParentMediaTime is called every animation frame on parent media proxies. Even without true drift, browser internals report tiny jitter on currentTime reads — typically below 30 ms but occasionally crossing the threshold for a frame. Writing to currentTime triggers a seek, which is expensive and invalidates pipeline buffers, which causes the next frame's reading to jitter further. The result was unnecessary seek thrash on otherwise-aligned media.

By requiring two consecutive over-threshold samples, transient jitter is filtered out while real drift (a sustained offset) still corrects within ~1 frame of latency. This eliminates the most common cause of dropped frames on the studio thumbnail grid.

What changed

  • Each _parentMedia entry gains a driftSamples counter that increments while the absolute drift is above MIRROR_DRIFT_THRESHOLD_SECONDS and resets to 0 on the first sample below.
  • _mirrorParentMediaTime(el, opts) only writes back when driftSamples >= 2, except when opts.force === true.
  • promoteToParentProxy and _onIframeMediaAdded pass force: true so the first alignment after registration is still immediate (these are user-visible state transitions, not steady-state telemetry).

Test plan

  • 11 new unit/integration tests in hyperframes-player.test.ts covering:
    • Single-sample jitter does not trigger a write.
    • Two-sample sustained drift does trigger a write.
    • Trending drift correction (gradually increasing offset) is detected within 2 samples.
    • force: true override bypasses the sample requirement.
    • Out-of-range proxies (proxies whose source has been removed) do not panic.
    • Multiple proxies maintain independent counters — drift on one does not affect the other.
    • _promoteToParentProxy alignment is immediate.

Stack

Step P1-4 of the player perf proposal. Builds on P1-1 (shared adopted stylesheets) and P1-2 (scoped media observer). Together these three target the studio multi-player render path — P0-1* perf gate scenarios will pick up the wins automatically.

Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right design. Two-sample gate absorbs single-tick jitter (bridge interval, GC pause, throttled rAF) without thrashing currentTime, and the force: true escape hatch preserves zero-tolerance semantics at ownership promotion and proxy initialization — the two moments where a misaligned audible sample would be perceptually unacceptable.

Counter-reset-on-threshold-recovery is the critical invariant and it's exercised directly (a sample coming back within threshold clears the counter, so a later spike can't piggy-back on stale state). Similarly, out-of-range samples zero the counter — a proxy dropping out of its active window and returning later starts fresh.

The integration test (_promoteToParentProxy invokes _mirrorParentMediaTime with force: true) is the kind of call-site pin I'd want on a subtle perf contract like this: without it, a future refactor could remove { force: true } from the promotion path and every test still passes even though ownership flips would audibly drift.

Worth-calling-out-but-not-blocking: MIRROR_REQUIRED_CONSECUTIVE_DRIFT_SAMPLES = 2 is a constant — if bridge cadence ever changes (the comment cites bridgeMaxPostIntervalMs: 80, so worst-case correction latency is ~160ms), 2 may need to be re-tuned. Consider a doc-comment cross-reference so whoever touches the bridge cadence sees the coupling.

Approved.

Rames Jusso

@vanceingalls vanceingalls force-pushed the perf/p1-4-coalesce-mirror-parent-media-time branch from 7600cd7 to 62e91f3 Compare April 22, 2026 00:43
@vanceingalls
Copy link
Copy Markdown
Collaborator Author

@jrusso1020 — thanks for the review. The non-blocking observation is now addressed:

MIRROR_REQUIRED_CONSECUTIVE_DRIFT_SAMPLES = 2 is a constant — if bridge cadence ever changes (the comment cites bridgeMaxPostIntervalMs: 80, so worst-case correction latency is ~160ms), 2 may need to be re-tuned. Consider a doc-comment cross-reference so whoever touches the bridge cadence sees the coupling.

Done in ce9c0bb1 (docs(player,core): cross-reference drift-sample gate with bridge cadence). Both the constant in packages/player/src/hyperframes-player.ts and the bridge interval definition in packages/core/src/runtime/state.ts now carry mirrored doc-comments calling out the coupling and the worst-case correction latency, so a future change to either side surfaces the contract.

@vanceingalls vanceingalls force-pushed the perf/p1-2-scope-media-mutation-observer branch from 73bcd21 to 49c427b Compare April 22, 2026 22:20
@vanceingalls vanceingalls force-pushed the perf/p1-4-coalesce-mirror-parent-media-time branch from ce9c0bb to 5989014 Compare April 22, 2026 22:20
@vanceingalls vanceingalls force-pushed the perf/p1-2-scope-media-mutation-observer branch from 49c427b to 1a22a89 Compare April 22, 2026 22:36
@vanceingalls vanceingalls force-pushed the perf/p1-4-coalesce-mirror-parent-media-time branch from 5989014 to 1b731fb Compare April 22, 2026 22:36
@vanceingalls vanceingalls force-pushed the perf/p1-2-scope-media-mutation-observer branch from 1a22a89 to 64bfffe Compare April 22, 2026 22:41
vanceingalls added a commit that referenced this pull request Apr 22, 2026
Per reviewer feedback on #396: surface the coupling between the
mirror-loop's required-consecutive-drift-samples constant and the
timeline-control bridge's max post interval, in both directions, so a
future change to either side immediately points at the other.

The coupling is:
  worst_case_correction_latency_ms
    ≈ MIRROR_REQUIRED_CONSECUTIVE_DRIFT_SAMPLES × bridgeMaxPostIntervalMs

At today's values (2 × 80 ms = 160 ms) this stays under the perceptual
A/V re-sync tolerance. Doc-only change — no behavior delta.
@vanceingalls vanceingalls force-pushed the perf/p1-4-coalesce-mirror-parent-media-time branch from 1b731fb to b187b28 Compare April 22, 2026 22:41
@vanceingalls vanceingalls force-pushed the perf/p1-2-scope-media-mutation-observer branch from 64bfffe to bc1d71c Compare April 22, 2026 23:29
@vanceingalls vanceingalls force-pushed the perf/p1-4-coalesce-mirror-parent-media-time branch from b187b28 to 7a3c263 Compare April 22, 2026 23:29
@vanceingalls vanceingalls mentioned this pull request Apr 22, 2026
3 tasks
@vanceingalls vanceingalls force-pushed the perf/p1-4-coalesce-mirror-parent-media-time branch from 7a3c263 to f870a8f Compare April 22, 2026 23:42
@vanceingalls vanceingalls changed the base branch from perf/p1-2-scope-media-mutation-observer to graphite-base/396 April 22, 2026 23:46
Require two consecutive readings above MIRROR_DRIFT_THRESHOLD_SECONDS
before writing back to el.currentTime, eliminating seek thrash from
transient jitter on the parent timeline. Each _parentMedia entry
gains a driftSamples counter; promoteToParentProxy and
_onIframeMediaAdded pass force:true to keep one-shot alignments
immediate.

Adds 11 unit/integration tests covering jitter, trending drift, force
override, out-of-range proxies, multi-proxy independence, and
_promoteToParentProxy alignment.
Per reviewer feedback on #396: surface the coupling between the
mirror-loop's required-consecutive-drift-samples constant and the
timeline-control bridge's max post interval, in both directions, so a
future change to either side immediately points at the other.

The coupling is:
  worst_case_correction_latency_ms
    ≈ MIRROR_REQUIRED_CONSECUTIVE_DRIFT_SAMPLES × bridgeMaxPostIntervalMs

At today's values (2 × 80 ms = 160 ms) this stays under the perceptual
A/V re-sync tolerance. Doc-only change — no behavior delta.
@vanceingalls vanceingalls force-pushed the perf/p1-4-coalesce-mirror-parent-media-time branch from f870a8f to 2213f37 Compare April 22, 2026 23:47
@graphite-app graphite-app Bot changed the base branch from graphite-base/396 to main April 22, 2026 23:47
@vanceingalls vanceingalls reopened this Apr 23, 2026
@vanceingalls vanceingalls merged commit f906797 into main Apr 23, 2026
51 checks passed
Copy link
Copy Markdown
Collaborator Author

Merge activity

vanceingalls added a commit that referenced this pull request Apr 23, 2026
… drift (#400)

## Summary

Second slice of `P0-1` from the player perf proposal: plugs the three steady-state scenarios — sustained playback FPS, scrub latency, and media-sync drift — into the perf gate that landed in #399. Adds the multi-video fixture they all share, wires three new shards into CI, and seeds one new baseline (`droppedFramesMax`).

## Why

#399 stood up the harness and proved it with a single load-time scenario. By itself that's enough to catch regressions in initial composition setup, but it can't catch the things players actually fail at in production:

- **FPS regressions** — a render-loop change that drops the ticker from 60 to 45 fps still loads fast.
- **Scrub latency regressions** — the inline-vs-isolated split (#397) is exactly the kind of code path where a refactor can silently push everyone back to the postMessage round trip.
- **Media drift** — runtime mirror logic (#396 in this stack) and per-frame scheduling tweaks can both cause video to slip out of sync with the composition clock without producing a single console error.

Each of these is a target metric in the proposal with a concrete budget. This PR turns those budgets into gated CI signals and produces continuous data for them on every player/core/runtime change.

## What changed

### Fixture — `packages/player/tests/perf/fixtures/10-video-grid/`

- `index.html`: 10-second composition, 1920×1080, 30 fps, with 10 simultaneously-decoding video tiles in a 5×2 grid plus a subtle GSAP scale "breath" on each tile (so the rAF/RVFC loops have real work to do without GSAP dominating the budget the decoder needs).
- `sample.mp4`: small (~190 KB) clip checked in so the fixture is hermetic — no external CDN dependency, identical bytes on every run.
- Same `data-composition-id="main"` host pattern as `gsap-heavy`, so the existing harness loader works without changes.

### `02-fps.ts` — sustained playback frame rate

- Loads `10-video-grid`, calls `player.play()`, samples `requestAnimationFrame` callbacks inside the iframe for 5 s.
- Crucial sequencing: install the rAF sampler **before** `play()`, wait for `__player.isPlaying() === true`, **then reset the sample buffer** — otherwise the postMessage round-trip ramp-up window drags the average down by 5–10 fps.
- FPS = `(samples − 1) / (lastTs − firstTs in s)`; uses rAF timestamps (the same ones the compositor saw) rather than wall-clock `setTimeout`, so we're measuring real frame production.
- Dropped-frame definition matches Chrome DevTools: gap > 1.5× (1000/60 ms) ≈ 25 ms = "missed at least one vsync."
- Aggregation across runs: `min(fps)` and `max(droppedFrames)` — worst case wins, since the proposal asserts a floor on fps and a ceiling on drops.
- Emits `playback_fps_min` (higher-is-better, baseline `fpsMin = 55`) and `playback_dropped_frames_max` (lower-is-better, baseline `droppedFramesMax = 3`).

### `04-scrub.ts` — scrub latency, inline + isolated

- Loads `10-video-grid`, pauses, then issues 10 seek calls in two batches: first the synchronous **inline** path (`<hyperframes-player>`'s default same-origin `_trySyncSeek`), then the **isolated** path (forced by replacing `_trySyncSeek` with `() => false`, which makes the player fall back to the postMessage `_sendControl("seek")` bridge that cross-origin embeds and pre-#397 builds use).
- Inline runs first so the isolated mode's monkey-patch can't bleed back into the inline samples.
- Detection: a rAF watcher inside the iframe polls `__player.getTime()` until it's within `MATCH_TOLERANCE_S = 0.05 s` of the requested target. Tolerance exists because the postMessage bridge converts seconds → frame number → seconds, and that round-trip can introduce sub-frame quantization drift even for targets on the canonical fps grid.
- Timing: `performance.timeOrigin + performance.now()` in both contexts. `timeOrigin` is consistent across same-process frames, so `t1 − t0` is a true wall-clock latency, not a host-only or iframe-only stopwatch.
- Targets alternate forward/backward (`1.0, 7.0, 2.0, 8.0, 3.0, 9.0, 4.0, 6.0, 5.0, 0.5`) so no two consecutive seeks land near each other — protects the rAF watcher from matching against a stale `getTime()` value before the seek command is processed.
- Aggregation: `percentile(95)` across the pooled per-seek latencies from every run. With 10 seeks × 2 modes × 3 runs we get 30 samples per mode per CI shard, enough for a stable p95.
- Emits `scrub_latency_p95_inline_ms` (lower-is-better, baseline `scrubLatencyP95InlineMs = 33`) and `scrub_latency_p95_isolated_ms` (lower-is-better, baseline `scrubLatencyP95IsolatedMs = 80`).

### `05-drift.ts` — media sync drift

- Loads `10-video-grid`, plays 6 s, instruments **every** `video[data-start]` element with `requestVideoFrameCallback`. Each callback records `(compositionTime, actualMediaTime)` plus a snapshot of the clip transform (`clipStart`, `clipMediaStart`, `clipPlaybackRate`).
- Drift = `|actualMediaTime − ((compTime − clipStart) × clipPlaybackRate + clipMediaStart)|` — the same transform the runtime applies in `packages/core/src/runtime/media.ts`, snapshotted once at sampler install so the per-frame work is just subtract + multiply + abs.
- Sustain window is 6 s (not the proposal's 10 s) because the fixture composition is exactly 10 s long and we want headroom before the end-of-timeline pause/clamp behavior. With 10 videos × ~25 fps × 6 s we still pool ~1500 samples per run — more than enough for a stable p95.
- Same "reset buffer after play confirmed" gotcha as `02-fps.ts`: frames captured during the postMessage round-trip would compare a non-zero `mediaTime` against `getTime() === 0` and inflate drift by hundreds of ms.
- Aggregation: `max()` and `percentile(95)` across the pooled per-frame drifts. The proposal's max-drift ceiling of 500 ms is intentional — the runtime hard-resyncs when `|currentTime − relTime| > 0.5 s`, so a regression past 500 ms means the corrective resync kicked in and the viewer saw a jump.
- Emits `media_drift_max_ms` (lower-is-better, baseline `driftMaxMs = 500`) and `media_drift_p95_ms` (lower-is-better, baseline `driftP95Ms = 100`).

### Wiring

- `packages/player/tests/perf/index.ts`: add `fps`, `scrub`, `drift` to `ScenarioId`, `DEFAULT_RUNS`, the default scenario list (`--scenarios` defaults to all four), and three new dispatch branches.
- `packages/player/tests/perf/perf-gate.ts`: add `droppedFramesMax: number` to `PerfBaseline`. Other baseline keys for these scenarios were already seeded in #399.
- `packages/player/tests/perf/baseline.json`: add `droppedFramesMax: 3`.
- `.github/workflows/player-perf.yml`: three new matrix shards (`fps` / `scrub` / `drift`) at `runs: 3`. Same `paths-filter` and same artifact-upload pattern as the `load` shard, so the summary job aggregates them automatically.

## Methodology highlights

These three patterns recur in all three scenarios and are worth noting because they're load-bearing for the numbers we report:

1. **Reset buffer after play-confirmed.** The `play()` API is async (postMessage), so any samples captured before `__player.isPlaying() === true` belong to ramp-up, not steady-state. Both `02-fps` and `05-drift` clear `__perfRafSamples` / `__perfDriftSamples` *after* the wait. Without this, fps drops 5–10 and drift inflates by hundreds of ms.
2. **Iframe-side timing.** All three scenarios time inside the iframe (`performance.timeOrigin + performance.now()` for scrub, rAF/RVFC timestamps for fps/drift) rather than host-side. The iframe is what the user sees; host-side timing would conflate Puppeteer's IPC overhead with real player latency.
3. **Stop sampling before pause.** Sampler is deactivated *before* `pause()` is issued, so the pause command's postMessage round-trip can't perturb the tail of the measurement window.

## Test plan

- [x] Local: `bun run player:perf` runs all four scenarios end-to-end on the 10-video-grid fixture.
- [x] Each scenario produces metrics matching its declared `baselineKey` so `perf-gate.ts` can find them.
- [x] Typecheck, lint, format pass on the new files.
- [x] Existing player unit tests untouched (no production code changes in this PR).
- [ ] First CI run will confirm the new shards complete inside the workflow timeout and that the summary job picks up their `metrics.json` artifacts.

## Stack

Step `P0-1b` of the player perf proposal. Builds on:

- `P0-1a` (#399): the harness, runner, gate, and CI workflow this PR plugs new scenarios into.

Followed by:

- `P0-1c` (#401): `06-parity` — live playback frame vs. synchronously-seeked reference frame, compared via SSIM, on the existing `gsap-heavy` fixture from #399.
vanceingalls added a commit that referenced this pull request Apr 23, 2026
## Summary

Adds **scenario 06: live-playback parity** — the third and final tranche of the P0-1 perf-test buildout (`p0-1a` infra → `p0-1b` fps/scrub/drift → this).

The scenario plays the `gsap-heavy` fixture, freezes it mid-animation, screenshots the live frame, then synchronously seeks the same player back to that exact timestamp and screenshots the reference. The two PNGs are diffed with `ffmpeg -lavfi ssim` and the resulting average SSIM is emitted as `parity_ssim_min`. Baseline gate: **SSIM ≥ 0.95**.

This pins the player's two frame-production paths (the runtime's animation loop vs. `_trySyncSeek`) to each other visually, so any future drift between scrub and playback fails CI instead of silently shipping.

## Motivation

`<hyperframes-player>` produces frames two different ways:

1. **Live playback** — the runtime's animation loop advances the GSAP timeline frame-by-frame.
2. **Synchronous seek** (`_trySyncSeek`, landed in #397) — for same-origin embeds, the player calls into the iframe runtime's `seek()` directly and asks for a specific time.

These paths must agree. If they don't — different rounding, different sub-frame sampling, different state ordering — scrubbing a paused composition shows different pixels than a paused-during-playback frame at the same time. That's a class of bug that only surfaces visually, never in unit tests, and only at specific timestamps where many things are mid-flight.

`gsap-heavy` is a 10s composition with 60 tiles each running a staggered 4s out-and-back tween. At t=5.0s a large fraction of those tiles are mid-flight, so the rendered frame has many distinct, position-sensitive pixels — the worst-case input for any sub-frame disagreement. If the two paths produce identical pixels here, they'll produce identical pixels everywhere that matters.

## What changed

- **`packages/player/tests/perf/scenarios/06-parity.ts`** — new scenario (~340 lines). Owns capture, seek, screenshot, SSIM, artifact persistence, and aggregation.
- **`packages/player/tests/perf/index.ts`** — register `parity` as a scenario id, default-runs = 3, dispatch to `runParity`, include in the default scenario list.
- **`packages/player/tests/perf/perf-gate.ts`** — extend `PerfBaseline` with `paritySsimMin`.
- **`packages/player/tests/perf/baseline.json`** — `paritySsimMin: 0.95`.
- **`.github/workflows/player-perf.yml`** — add a `parity` shard (3 runs) to the matrix alongside `load` / `fps` / `scrub` / `drift`.

## How the scenario works

The hard part is making the two captures land on the *exact same timestamp* without trusting `postMessage` round-trips or arbitrary `setTimeout` settling.

1. **Install an iframe-side rAF watcher** before issuing `play()`. The watcher polls `__player.getTime()` every animation frame and, the first time `getTime() >= 5.0`, calls `__player.pause()` *from inside the same rAF tick*. `pause()` is synchronous (it calls `timeline.pause()`), so the timeline freezes at exactly that `getTime()` value with no postMessage round-trip. The watcher's Promise resolves with that frozen value as the canonical `T_actual` for the run.
2. **Confirm `isPlaying() === true`** via `frame.waitForFunction` before awaiting the watcher. Without this, the test can hang if `play()` hasn't kicked the timeline yet.
3. **Wait for paint** — two `requestAnimationFrame` ticks on the host page. The first flushes pending style/layout, the second guarantees a painted compositor commit. Same paint-settlement pattern as `packages/producer/src/parity-harness.ts`.
4. **Screenshot the live frame** — `page.screenshot({ type: "png" })`.
5. **Synchronously seek to `T_actual`** — call `el.seek(capturedTime)` on the host page. The player's public `seek()` calls `_trySyncSeek` which (same-origin) calls `__player.seek()` synchronously, so no postMessage await is needed. The runtime's deterministic `seek()` rebuilds frame state at exactly the requested time.
6. **Wait for paint** again, screenshot the reference frame.
7. **Diff with ffmpeg** — `ffmpeg -hide_banner -i reference.png -i actual.png -lavfi ssim -f null -`. ffmpeg writes per-channel + overall SSIM to stderr; we parse the `All:` value, clamp at 1.0 (ffmpeg occasionally reports 1.000001 on identical inputs), and treat it as the run's score.
8. **Persist artifacts** under `tests/perf/results/parity/run-N/` (`actual.png`, `reference.png`, `captured-time.txt`) so CI can upload them and so a failed run is locally reproducible. Directory is already gitignored via the existing `packages/player/tests/perf/results/` rule.

### Aggregation

`min()` across runs, **not** mean. We want the *worst observed* parity to pass the gate so a single bad run can't get masked by averaging. Both per-run scores and the aggregate are logged.

### Output metric

| name              | direction        | baseline             |
|-------------------|------------------|----------------------|
| `parity_ssim_min` | higher-is-better | `paritySsimMin: 0.95` |

With deterministic rendering enabled in the runner, identical pixels produce SSIM very close to 1.0; the 0.95 threshold leaves headroom for legitimate fixture-level noise (font hinting, GPU compositor variance) while still catching any real disagreement between the two paths.

## Test plan

- `bun run player:perf -- --scenarios=parity --runs=3` locally on `gsap-heavy` — passes with SSIM ≈ 0.999 across all 3 runs.
- Inspected `results/parity/run-1/actual.png` and `reference.png` side-by-side — visually identical.
- Inspected `captured-time.txt` to confirm `T_actual` lands just past 5.0s (within one frame).
- Sanity test: temporarily forced a 1-frame offset between live and reference capture; SSIM dropped well below 0.95 as expected, confirming the threshold catches real drift.
- CI: `parity` shard added alongside the existing `load` / `fps` / `scrub` / `drift` shards; same `measure`-mode / artifact-upload / aggregation flow.
- `bunx oxlint` and `bunx oxfmt --check` clean on the new scenario.

## Stack

This is the top of the perf stack:

1. #393 `perf/x-1-emit-performance-metric` — performance.measure() emission
2. #394 `perf/p1-1-share-player-styles-via-adopted-stylesheets` — adopted stylesheets
3. #395 `perf/p1-2-scope-media-mutation-observer` — scoped MutationObserver
4. #396 `perf/p1-4-coalesce-mirror-parent-media-time` — coalesce currentTime writes
5. #397 `perf/p3-1-sync-seek-same-origin` — synchronous seek path (the path this PR pins)
6. #398 `perf/p3-2-srcdoc-composition-switching` — srcdoc switching
7. #399 `perf/p0-1a-perf-test-infra` — server, runner, perf-gate, CI
8. #400 `perf/p0-1b-perf-tests-for-fps-scrub-drift` — fps / scrub / drift scenarios
9. **#401 `perf/p0-1c-live-playback-parity-test` ← you are here**

With this PR landed the perf harness covers all five proposal scenarios: `load`, `fps`, `scrub`, `drift`, `parity`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants